Background: Locating the protein-coding genes in novel genomes is essential to understanding and exploiting\nthe genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed\ninformation about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates\nmany expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have\nbeen intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now\nor will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity\nof well-studied fungi call for gene-prediction tools tailored to them.\nResults: SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the\ngeneration of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The\npipeline has been developed and streamlined by comparing its predictions to manually curated gene models in\nthree fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl\npredicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running\nthe HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best\nhomology to known proteins and best agreement with the RNA-Seq data.\nConclusions: SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and\nnovel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is\nfreely available from http://sourceforge.net/projects/snowyowl/.
Loading....